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Abstract 

In the classic view introduced by R.A. Fisher, a quantitative trait is encoded by many loci with 
small, additive effects. Recent advances in QTL mapping have begun to elucidate the genetic ar- 
chitectures underlying vast numbers of phenotypes across diverse taxa, producing observations that 
sometimes contrast with Fisher's blueprint. Despite these considerable empirical efforts to map the 
genetic determinants of traits, it remains poorly understood how the genetic architecture of a trait 
should evolve, or how it depends on the selection pressures on the trait. Here we develop a simple, 
population-genetic model for the evolution of genetic architectures. Our model predicts that traits 
under moderate selection should be encoded by many loci with highly variable effects, whereas traits 
under either weak or strong selection should be encoded by relatively few loci. We compare these 
theoretical predictions to qualitative trends in the genetics of human traits, and to systematic data on 
the genetics of gene expression levels in yeast. Our analysis provides an evolutionary explanation for 
broad empirical patterns in the genetic basis of traits, and it introduces a single framework that unifies 
the diversity of observed genetic architectures, ranging from Mendelian to Fisherian. 



A quantitative trait is encoded by a set of genetic loci whose alleles contribute directly the trait value, 
interact epistatically to modulate each others' contributions, and possibly contribute to other traits. The 
resulting genetic architecture of a trait pQ influences its variational properties [2H5] and therefore affects 
a population's capacity to adapt to new environmental conditions [1,6, 7]. Over longer timescales, genetic 
architectures of traits have important consequences for the evolution of recombination [8] , of sex [9] and 
even reproductive isolation and speciation [TO] . 

Although scientists have studied the genetic basis of phenotypic variation for more than a century, 
recent technologies, as well as the promise of agricultural and medical applications, have stimulated 
tremendous efforts to map quantitative trait loci (QTL) in diverse taxa [11H19] . These studies have 
revealed many traits that seem to rely on Fisherian architectures, with contributions from many loci 
[20j . whose additive effects are often so small that QTL studies lack power to detect them individually 
|16y2R [22] . Other traits, however, are encoded by a relatively small number of loci - including the large 
number of human phenotypes with known Mendelian inheritance. 

The subtle statistical issues of designing and interpreting QTL studies in order to accurately infer the 
molecular determinants of a trait are already actively studied [16.21.22]. Nevertheless, distinct from these 
statistical issues of inferences from empirical data, we lack a theoretical framework for forming a priori 
expectations about the genetic architecture underlying a trait [US]. For instance, what types of traits 
should we expect to be monogenic, and what traits should be highly polygenic? More generally, how does 
the genetic architecture underlying a trait evolve, and what features of a trait shape the evolution of its 
architecture? To address these questions we developed a mathematical model for the evolution of genetic 
architectures, and we compared its predictions to a large body of empirical data on quantitative traits. 
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Figure 1. The genetic architecture underlying a trait de- 
pends on the strength of selection on the trait, in a population- 
genetic model. Traits subject to intermediate selection (inter- 
mediate values of 07) evolve genetic architectures with the 
greatest number of controlling loci. Dots denote the mean 
number of loci in the architecture underlying a trait, among 
500 replicate Wright-Fisher simulations, for each value of the 
selection pressure a f . The rectangular areas represent the dis- 
tribution of the number of loci in the architecture. The neutral 
expectations for the equilibrium number of loci (see Methods) 
are represented as grey lines, when recruitment events are neu- 
tral (top line) or not (bottom line). Parameters are set to their 
default values (table S2). 



Results and Discussion 

Genetic architectures predicted by a population-genetic model 

Our approach to understanding the evolution of genetic architectures combines standard models from 
quantitative genetics [23] with the Wright-Fisher model from population genetics [23]. In its simplest 
version, our model considers a continuous trait whose value, x, is influenced by L loci. Each locus i 
contributes additively an amount aj, so that the trait value is defined as the mean of the on values 
across contributing loci, as in [251126]. This choice of trait definition, and alternatives such as the sum, 
are discussed below. The fitness of an individual with trait value x is assumed Gaussian with mean 
and standard deviation 07, so that smaller values of Cf correspond to stronger stabilizing selection 
on the trait [23]. Individuals in a population of size N replicate according to their relative fitnesses. 
Upon replication, an offspring may acquire a point mutation that alters the direct effect of one locus, 
i, perturbing the value of on for the offspring by a normal deviate; or the offspring may experience a 
duplication or a deletion in a contributing locus, which changes the number of loci L that control the 
trait value in that individual (see Methods). Point mutations, duplications, and deletions occur at rates 
f 1 , r dup, r deh which have comparable magnitudes in nature [27H3D] (table SI). Finally, an offspring may 
also increase the number of loci that contribute to its trait value by recruitment - that is, by acquiring a 
recruitment mutation, with probability \i x r rec , in some gene that did not previously contribute to the 
trait value (see Methods). 

Over successive generations in our model, the genetic architecture underlying the trait - that is, 
how many loci contribute to the trait's value, and the extent of their contributions - varies among the 
individuals in the population, and evolves. The genetic architectures that evolve in our model represent 
the complete genetic determinants of a trait, as opposed to associations with genetic loci that would be 
detected based on polymorphisms segregating in a sample of individuals in a QTL study. We discuss this 
important distinction below, when we compare the predictions of our model to empirical QTL data. 

We studied the evolution of genetic architectures in sets of 500 replicate populations, simulated by 
Monte Carlo, with different amounts of selection on the trait. We ran each of these simulations for 50 mil- 
lion generations, in order to model the extensive evolutionary divergence over which genetic architectures 
are assembled in nature. The form of the genetic architecture that evolves in our model depends critically 
on the strength of selection on the trait. In particular, we found a striking non-monotonic pattern: the 
equilibrium number of loci that influence a trait is greatest when the strength of selection on the trait is 
intermediate (Fig. 1). Moreover, the variability in the contributions of loci to the trait value (Fig. SI) 
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and the effects of deleting or duplicating genes (Fig. S2) are also greatest for a trait under intermediate 
selection. In other words, our model predicts that traits under moderate selection will be encoded by 
many loci with highly divergent effects; whereas traits under strong or weak selection will be encoded by 
relatively few loci. 

We also studied how epistatic interactions among loci influence the evolution of genetic architecture. 
To incorporate the influence of locus j on the contribution of locus i we introduced epistasis parameters 
j3ji so that the trait value is now given by 

%=\ ^ \j=i ' ' 

where fp is a standard sigmoidal filter function [8] (see Methods and Fig. S4). As with the direct effects 
of loci, the epistatic effects were allowed to mutate and vary within the population, and evolve. Although 
significant epistatic interactions emerge in the evolved populations (Fig. S3B), the presence of epistasis 
does not strongly affect the average number of loci that control a trait (Figs. S3A and S4). Epistasis is 
not required for the evolution of large L, nor does it change the shape of its dependence on the strength 
of selection. 

Intuition for the results 

There is an intuitive explanation for the non-monotonic relationship between the selection pressure on a 
trait and the number of loci that control it. For a trait under weak selection (high 07), changes in the 
trait value have little effect on fitness. Thus, even if deletions, recruitments and duplications change the 
trait value, these changes are nearly neutral (Fig. 2). As a result, the number of loci controlling the trait 
evolves to its neutral equilibrium, which is small because deletions are more frequent than duplications 
and recruitments (see Methods, Figs. 1 and S3). On the other hand, when selection on a trait is very 
strong (low aj), few point mutations, and only those with small effects on the trait, will fix in the 
population. As a result, all loci have similar contributions to the trait value (Fig. 2 - row 1), and so 
duplications or deletions again have little effect on the trait or on fitness (Fig. 2 - rows 2 and 3). In 
this case, the equilibrium number of loci is given by the value expected when deletions and duplications, 
but not recruitments, are neutral (Figs. 1 and S3). Only when selection on a trait is moderate can 
variation in the contributions across loci accrue and impact the fixation of deletions and duplications 
(Fig. 2 - row 4), by a process called compensation: a slightly deleterious point mutation at one locus, 
which perturbs the trait value, segregates long enough to be compensated by point mutations at other 
loci |31H34j . Compensation increases the variance in the contributions among loci (Fig. 2, row 1), as 
has been observed for many phenotypes in plants and animals [35]. Finally, even though duplications 
and deletions are mildly deleterious in this regime, there is a bias favoring duplications over deletions 
(Fig. 2 - row 3). This bias arises because duplications increase the number of loci in the architecture, 
which attenuates the effect of each locus on the trait (Fig. 2 - row 2). Thus when selection is moderate, 
duplications and recruitments fix more often than deletions and drive the number of contributing loci 
above its neutral expectation (Fig. 2 - rows 4 and 5). As the number of loci increases the bias is reduced 
(Fig. 2 - rows 4 and 5), and so L equilibrates at a predictable value (Figs. 1 and S3). 

Robustness of results to model assumptions 

The predictions of our model - notably, that the number of loci in a genetic architecture is greatest for 
traits under intermediate selection - are robust to choices of population- genetic parameters. The non- 
monotonic relation between selection pressure on a trait and the size of its genetic architecture, L, holds 
regardless of population size; but the location of maximum L is shifted towards weaker selection in larger 
populations (Fig. S5). This result is compatible with our explanation involving compensatory evolution: 
selection is more efficient in large populations, and so compensatory evolution occurs at smaller selection 
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Figure 2. The consequences of gene duplications, recruitments and deletions in a population-genetic model. Populations 
were initially evolved with a fixed number of controlling loci L (line 1), and we then measured the effects of recruitments, 
deletions and duplications on the trait value (line 2) and on fitness (line 3). From the latter, we calculated the rate at 
which deletions, recruitment and duplications enter and fix in the population (line 4), and the resulting rate of change in the 
number of loci contributing to the trait (line 5). Line 1: For L > 1, the variation in direct effects (ai) and indirect effects 
among controlling loci (YljiPji)) increases as selection on the trait is relaxed. Line 2: As a consequence of this variation 
among loci, the average change in the trait value following a duplication or a deletion also increases as selection on the trait 
is relaxed. Line 3: Changes in the trait value are not directly proportional to fitness costs, because the same change in x has 
milder fitness consequences when selection is weaker (larger 07). As a result, the average fitness detriment of duplications 
and deletions is highest for traits under intermediate selection. Line 4: Consequently, the fixation rates of duplications and 
deletions are smallest under intermediate selection. Line 5: The equilibrium number of loci controlling a trait under a given 
strength of selection is determined by that value of L for which duplications and recruitments on one side, and deletions on 
the other, enter and fix in the population at the same rate. For example, when 07 = 10 -1 ' 5 these rates are equal when L is 
close to 12 (black arrow), so that the equilibrium genetic architecture contains ~ 12 loci on average (compare Fig. S3 black 
arrow). 
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coefficients. Likewise, when the mutation rate is smaller the resulting equilibrium number of controlling 
loci is reduced (Fig. S6). This result is again compatible with the explanation of compensatory evolution, 
which requires frequent mutations. Increasing the rate of deletions relative to duplications also reduces 
the equilibrium number of loci in the genetic architecture, but our qualitative results are not affected 
even when r^ei is twice as large as r<j up (Fig. S7). Finally, increasing the rate of recruitment r rec (or 
the genome size) increases the number of loci contributing to all traits except those under very strong 
selection, as expected from Fig. 2. Our prediction that traits under intermediate selection are encoded 
by the richest genetic architectures is insensitive to changes in this parameter, and it holds even in the 
absence of recruitment (Fig. S8). 

Our analysis has relied on several quantitative-genetic assumptions, which can be relaxed. First, we 
assumed that all effects of locus i (i.e. ai and all fyj and /3ji) are simultaneously perturbed by a point 
mutation. Relaxing this assumption, so that a subset of the effects are perturbed, does not change our 
results qualitatively (Fig. S9). Second, we assumed that point mutations have unbounded effects so 
that variation across loci can increase indefinitely. To relax this assumption we made mutations less 
perturbative to loci with large effects (see Methods). Even a strong mutation bias of this type led to 
very small changes in the equilibrium behavior (Fig. S10). Third, we assumed no metabolic cost of 
additional loci, even though additional genes in Saccharomyces cerevisiae are known to decrease fitness 
slightly |36y37|. Nonetheless, including a metabolic cost proportional to L does not alter our qualitative 
predictions (Fig. Sll). Finally, we defined the trait value as the average of the contributions on across loci, 
as opposed to their sum. This definition reflects the intuitive notion that a gene product's contribution to 
a trait will generally depend on its abundance relative to all other contributing gene products. Moreover, 
this assumption that increasing the number of loci influencing a trait attenuates the effect of each one is 
supported by empirical data: changing a gene's copy number is known to have milder phenotypic effects 
when the gene has many duplicates [38|,I39| . Nonetheless, alternative definitions of the trait value, which 
span from the sum to the average of contributions across loci, generically exhibit the same qualitative 
results (text SI and Fig. S12). 

Although robust to model formulation and parameter values, our results do depend in part on initial 
conditions. When selection is strong, the initial genetic architecture can affect the evolutionary dynamics 
of the number of loci (Fig. S14). This occurs because the initial architecture may set dependencies 
among loci that prevent a reduction of their number. This result indicates that only those architectures 
of traits under very strong selection should depend on historical contingencies. We have also studied a 
multitrait version of our model, where genes participating in other traits can be recruited or lost through 
mutation. Even though this model features pleiotropy, and the effects of recruitments evolve neutrally, 
our qualitative results remained unaffected (text S3 and Fig. S15). 

The dynamics of copy number 

Previous models related to genetic architecture have been used to study the evolutionary fate of gene 
duplicates. These models typically assume that a gene has several sub-functions, which can be gained 
(neo-functionalization) [3D] or lost (sub-functionalization) [41U42] in one of two copies of a gene. Such 
"fate-determining mutations" [43] stabilize the two copies, as they make subsequent deletions deleterious. 
Such models complement our approach, by providing insight into the evolution of discrete, as opposed to 
continuous or quantitative, phenotypes. Yet there are several qualitative differences between our analysis 
and previous studies of gene duplication. Most important, our model considers the dynamics of both 
duplications and deletions, in the presence of point mutations that perturb the contributions of loci to 
a trait. This co-incidence of timescales is important in the light of empirical data |27H30| showing that 
changes in copy numbers occur at similar rates as point mutations (table SI). Under these circumstances, 
a gene may be deleted or acquire a loss-of-function mutation before a new function is gained or lost. Our 
model includes these realistic rates, and accordingly we find that duplicates are very rarely stabilized by 
subsequent point mutations. Instead, the number of loci in a genetic architecture may increase, in our 
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model, because compensatory point mutations introduce a bias towards the fixation of duplications as 
opposed to deletions. 

Comparison to empirical eQTL data 

Like most evolutionary models, our analysis greatly simplifies the mechanistic details of how specific 
traits influence fitness in specific organisms. As a result, our analysis is general, but it is intended to 
explain only the broadest qualitative features of how genetic architectures vary among phenotypic traits. 
Moreover, our model predicts substantial variation in the number of loci controlling a trait under any 
given selection pressure (Fig. 1). Nonetheless, the overall non-monotonic pattern predicted by our model 
helps to explain well-known trends in genetics of human traits: traits under moderate selection, such as 
stature or susceptibility to mid-life diseases like diabetes, cancer, or heart-disease, are typically complex 
and highly polygenic; whereas traits under very strong selection, such as childhood-lethal disease like 
Cystic fibrosis or Haemophilias, and traits under very weak selection, such as handedness, bitter taste, 
or hitchhiker's thumb, are often Mendelian. Our analysis provides an evolutionary explanation for these 
general trends, and it delineates the selective conditions under which we should expect a Mendelian, as 
opposed to Fisherian, architecture. 

We tested our evolutionary model of genetic architectures by comparison with empirical data on a 
large number of traits. Such a comparison must, of course, account for the fact that our model describes 
the true genetic architecture underlying a trait, whereas any QTL study has limited power and describes 
only the associations detected from polymorphisms segregrating in a particular sample of individuals. 
Accounting for this discrepancy (see below), we compared our model to data from the study of Brem et 
al [15], who measured mRNA expression levels and genetic markers in 112 recombinant strains produced 
from two divergent lines of S. cerevisiae. For each yeast transcript we computed the number of non- 
contiguous markers associated with transcript level, at a given false discovery rate (see Methods). We 
also calculated the codon adaptation index (CAI) of each transcript - an index that correlates with 
the gene's wildtype expression level and with its overall importance to cellular fitness [33]. We found a 
striking, non-monotonic relationship between the CAI of a transcript and the number of loci linked to 
variation in its abundance (Fig. 3A). Thus, assuming that CAI correlates with the strength of selection on 
a transcript, Brem et al |15] detected more loci regulating yeast transcripts under intermediate selection 
than transcripts under either strong or weak selection. 

We compared the empirical data on yeast eQTLs (Fig. 3A) to the predictions of our evolutionary 
model. In order to make this comparison, we first evolved genetic architectures for traits under various 
amounts of selection (Fig. S3), and for each architecture we then simulated a QTL study of the exact 
same type and power as the yeast eQTL study: that is, we generated 112 crosses from two divergent lines 
using the yeast genetic map (text S2). As expected, the simulated QTL studies based these 112 segregants 
detected many fewer loci linked to a trait than in fact contribute to the trait in the true, underlying genetic 
architecture (Fig. 3B versus Fig. 1). This result is consistent with previous interpretations of empirical 
eQTL studies [16]. The simulated QTL studies revealed another important bias: a locus that contributes 
to a trait under weak selection is more likely to be correctly identified in a QTL study than a locus that 
contributes to a trait under strong selection (Fig. S16). Furthermore, our simulations demonstrate that 
the number of associations detected in such a QTL study depends on the divergence time between the 
parental strains used to generate recombinant lines (Fig. S17). Finally, traits under weaker selection may 
be more prone to measurement noise, which we also simulated (Fig. S18). Despite these detection biases, 
which we have quantified, the relationship between the selection pressure on a trait and the number of 
detected QTLs in our model (Fig. 3B and Figs. S18 and S19) agrees with the relationship observed in the 
yeast eQTL data (Fig. 3A). Importantly, both of these relationships exhibit the same qualitative trend: 
traits under intermediate selection are encoded by the richest genetic architectures. 
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Experiment 



Simulations 




Figure 3. The number of genetic loci controlling a trait inferred from real S. cerevisiae populations (panel A) and from 
simulated populations (B) has a non-monotonic relationship with the strength of selection on the trait. A: In the yeast data 
of Brem et al. [15] . the largest number of eQTLs were detected for those transcripts (i.e, traits) under intermediate levels 
of selection (intermediate CAI), whereas fewer eQTLs were detected for transcripts under either weak or strong selection. 
Transcripts were binned according to their log CAI values. Squares represent the distribution of the number of one-way 
eQTLs identified from the study of Brem et al. [15], for traits within each bin of CAI. Greyscale indicate the number of 
transcripts in each bin (darker means more data). Mean numbers of detected eQTLs are represented by circles. B: For 
the simulated experiment, we evolved 100 populations of genetic architectures, using the parameters corresponding to Fig. 
S3. From each such population, we then evolved two lines independently for 25, 000 generations in the absence of deletions, 
duplications and recruitment, to mimic the divergent strains used in the yeast cross of Brem et al. [15]. From these two 
divergent genotypes we then created 112 recombinant lines following the genetic map from Brem et al. [TS]. We then analyzed 
the resulting simulated data with R/qtl in the same way as we had analyzed the yeast data (text S2). The distribution of 
QTLs detected and their means are represented as in Fig. 1, for each value of selection strength 07. 



Conclusion 



Many interesting developments lie ahead. Our model is far too simple to account for tissue- and time- 
specific gene expression, dominance, context-dependent effects, etc [5]|35]. How these complexities will 
change predictions for the evolution of genetic architectures remains an open question. Nonetheless, our 
analysis shows that it is possible to study the evolution of genetic architecture from first principles, to 
form a priori expectations for the architectures underlying different traits, and to reconcile these theories 
with the expanding body of QTL studies on molecular, cellular, and organismal phenotypes. 
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Methods 



Model 

We described the evolution of genetic architectures using the Wright-Fisher model of a replicating popu- 
lation of size N, in which haploid individuals are chosen to reproduce each generation according to their 
relative fitnesses. The fitness of an individual with L loci encoding trait value x is 

Uk = G(x, 0, Uf) x (1 — L X c) (2) 

where G denotes the density at x of a Gaussian distribution with mean and standard deviation a f , and 
the second term denotes the metabolic cost of harboring L loci, which depends on a parameter c. The 
trait value of such an individual, given the direct contributions and epistatic terms (3ji is described by 
Eq. (1) where 

U(y) = 1 + e s py ( 3 ) 

is a sigmoidal curve, so that the epistatic interactions either diminish or augment the direct contribution 
of locus i depending on whether ^ • /3ji is positive or negative (Fig. S4). In general, loci do not influence 
themselves (/3a = 0) and, in the model without epistasis, all j3ji = and fg = l. If an individual chosen 
to reproduce experiences a duplication at locus i then the new duplicate, labelled k, inherits its direct 
effect = (Xi) and all interaction terms (/3k j = fyj and /3jk = f3ji for all j ^ i,k), with the interaction 
terms and (3ki initially set to zero. Recruitment occurs with probability r rec per mutation of one of 
the 6, 000 genes not contributing to the trait. The initial direct contribution on of recruited locus i is 
drawn from a normal distribution with mean zero and standard deviation a m ; its interaction terms with 
other loci (k), fin- and {3ki, are initially set to zero. Note that this assumption is relaxed in the multilocus 
version of our model, where the direct and indirect effects of recruitments evolve neutrally (text S3 and 
Fig. S15). 

In general a point mutation at locus i changes its contribution to the trait, on, and all its epistatic 
interactions, /3jj and (3ji, each by an independent amount drawn from a normal distribution with mean 
zero and standard deviation a m . The normal distribution satisfies the assumptions that small mutations 
are more frequent than large ones (46j|47j, and that there is no mutation pressure on the trait [23J. 
We relaxed the former assumption by drawing mutational effects from a uniform distribution without 
qualitative changes to our results (Fig. S13). In order to relax the latter assumption we included a bias 
towards smaller mutations in loci with large effects, so that the mean effect of a mutation at locus i now 
equals — b a x on and —bp x j3ij, respectively for an, and fyj [S5] . We also considered a model in which a 
mutation at locus i affects only a proportion p em of the values 014, Pij, and /3ji. By default, simulations 
were initialized with L = 1 and ot\ = 0; alternative initial conditions were also studied, as shown in Fig. 
S14. 



Markov chain for neutral changes in copy number 

When deletions and duplications are neutral, and recruitments strongly deleterious, the evolution of the 
number of loci L in the genetic architecture is described by a Markov-chain on the positive integers. The 
probability of a transition from L = i to L = i + 1 equals rd up x i, and that of a transition from i to i — 1 
is rdei x i. We disallow transitions to L = 0, assuming that some regulation of the trait is required. We 
obtained the stationary distribution of L by setting the density of d\ of individuals in stage 1 to 1 and 
calculating the density di of individuals in the following stages as 

d % = rdupX{l : 1) d ^ 1 (4) 

Tdel X I 



s 



The equilibrium probability of being in state i was calculated as 



Pi = , (5) 

and the expected value of L was calculated as x With r^up = 10~ 6 and r<iei = 1-25 x 10 -6 , we 

found an equilibrium expected L of 2.485. 

When deletions, duplications and recruitments are all neutral, equation (4) can be replaced by: 

, r dup X (i - 1) + 6000 x/ix r rec 

di = — : o!j_i (6) 

Tdel x i 

This equation illustrates the fact that the rates of deletions (which include loss of function mutations) 
and duplication depend on the number of loci in the architecture, whereas the rate of recruitments does 
not. With (i = 3 x 10 -6 and 

free — 5 x 10 , we found an equilibrium expected L of 4.705. 



Calculation of s and pfi x 

We first evolved populations to equilibrium with a fixed number of controlling loci L, and we then 
measured the effects of deletions, duplications or recruitments introduced randomly into the population. 
We simulated the evolution of the genetic architecture with L fixed in 500 replicate populations, over 
8 x 10 6 generations for deletions and 10 x 10 6 generations for duplications, reflecting the unequal waiting 
time before the two kinds of events. We used 10 x 10 6 generations for recruitment as well, although different 
durations did not affect our results. For each genotype k in each evolved population, we calculated the 
fitness uJk(i) of mutants with locus i deleted or duplicated. We calculated the corresponding selection 
coefficients as: 

s fcW = — — - 1 ( 7 ) 

<. LU > 

where < co > denotes mean fitness in the population. We calculated s as the mean across loci and geno- 
types of Sk(i), weighted by the number of individuals with each genotype. We calculated the probability 
of fixation of a duplication, deletion or recruitment as 

I _ e -2sfc(t) 

P^W) = I _ e -2Ns k (i) ' ( § ) 

and obtained the mean using the same method as for s. 

Rates of deletions and duplications fixing were calculated per locus (Fig. 2) as r^ei or r^up times p/i x - 
The total probability of a duplication or a deletion entering the population and fixing is, of course, also 
multiplied by L. However, recruitment rates remain constant as L changes. Therefore, we divided the 
rate of recruitments by L in Fig. 2, for comparison to the per-locus duplication and deletion rates. 



Number of loci influencing yeast transcript abundance 

We used the R/qtl [49,50 package to calculate LOD scores for a set of 1226 observed markers and 3223 
uniformly distributed pseudomarkers separated by 2 cM, by Haley-Knott regression. We calculated the 
LOD significance threshold for a false discovery rate (FDR) of 0.2 as the corresponding quantile in the 
distribution of the maximum LOD after 500 permutations (a FDR of 0.01 and a fixed LOD threshold 
of 3 produced qualitatively similar results). The number of detected loci linked to the expression of a 
transcript was calculated as the number of non-consecutive genomic regions with a LOD score above the 
threshold. We downloaded S. cerevisiae coding sequences from the Ensembl database (EF3 release), and 
calculated CAI values with the seqinr |51] package, using codon weights from a set of 134 ribosomal genes. 
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